free-text radiology report
LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation
Lee, Suhyeon, Kim, Won Jun, Chang, Jinho, Ye, Jong Chul
Following the impressive development of LLMs, vision-language alignment in LLMs is actively being researched to enable multimodal reasoning and visual input/output. This direction of research is particularly relevant to medical imaging because accurate medical image analysis and generation consist of reasoning based on a combination of visual features and prior knowledge. Many recent works have focused on training adapter networks that serve as an information bridge between image processing (encoding or generating) networks and LLMs; but presumably, in order to achieve maximum reasoning potential of LLMs on visual information as well, visual and language features should be allowed to interact more freely. This is especially important in the medical domain because understanding and generating medical images such as chest X-rays (CXR) require not only accurate visual and language-based reasoning but also a more intimate mapping between the two modalities. Thus, taking inspiration from previous work on the transformer and VQ-GAN combination for bidirectional image and text generation, we build upon this approach and develop a method for instruction-tuning an LLM pre-trained only on text to gain vision-language capabilities for medical images. Specifically, we leverage a pretrained LLM's existing question-answering and instruction-following abilities to teach it to understand visual inputs by instructing it to answer questions about image inputs and, symmetrically, output both text and image responses appropriate to a given query by tuning the LLM with diverse tasks that encompass image-based text-generation and text-based image-generation. We show that our model, LLM-CXR, trained in this approach shows better image-text alignment in both CXR understanding and generation tasks while being smaller in size compared to previously developed models that perform a narrower range of tasks. The last few years have seen remarkable development in the field of Large language models (LLMs). LLMs are considered a different class of AI models because of their ability to flexibly understand/generate natural language and perform language-based reasoning, allowing them to generalize to a variety of given tasks without the need to be explicitly trained for them. As a next step, methods to enable the input of visual information alongside language in LLMs (OpenAI, 2023; Liu et al., 2023; Alayrac et al., 2022; Li et al., 2023) as well as methods that output images from LLMs (Koh et al., 2023a;b) are being actively developed. These models have great potential to be particularly useful in the medical domain, as working with medical images such as chest X-rays (CXRs) requires the ability to understand context, perform reasoning, and communicate conclusions in both image and text forms.
@Radiology_AI
"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. To automatically identify a cohort of patients with pancreatic cystic lesions (PCLs) and to extract PCL measurements from historical computed tomographic (CT) and magnetic resonance (MR) imaging reports using natural language processing (NLP) and a question answering system. Institutional review board approval was obtained for this retrospective HIPAA-compliant study and the requirement to obtain informed consent was waived.
Deep-learning classifier understands free-text radiology reports
Free-text radiology reports can be automatically classified by convolutional neural networks (CNNs) powered by deep-learning algorithms with accuracy that's equal to or better than that achieved by traditional--and decidedly labor-intensive--natural language processing (NLP) methods. That's the conclusion of researchers led by Matthew Lungren, MD, MPH, of Stanford University. The team tested a CNN model they developed for mining pulmonary-embolism findings from thoracic CT reports generated at two institutions. Radiology published their study, lead-authored by Matthew Chen, MS, also of Stanford, online Nov. 13. The researchers analyzed annotations made by two radiologists for the presence, chronicity and location of pulmonary embolisms, then compared their CNN's performance with that of an NLP model considered quite proficient in this task, called PeFinder. They note that PeFinder and similar existing NLP techniques demand a "relatively high burden of development, including domain-specific feature engineering, complex annotations and laborious coding for specific tasks."